Revised Elements:
Instead of using valence and tempo, the plot now displays instrumentalness (x-axis) and arousal (y-axis), which were identified as more important features by the random forest model. Danceability is used for point size, suggesting it plays a significant role in distinguishing AI vs. Non-AI music.
Analysis:
There is a negative correlation between instrumentalness (x-axis) and arousal (y-axis). As instrumentalness increases, arousal tends to decrease. This suggests that tracks with more instrumental content tend to be less energetic.
AI-generated tracks appear more concentrated at higher instrumentalness values (0.7-0.9) and lower arousal values. Non-AI tracks are more spread out, with some occurring at lower instrumentalness values and higher arousal values. This suggests that AI music may favor more instrumental, low-energy compositions.
Danceability is mapped to point size, showing variation across AI and Non-AI tracks. Larger points (high danceability) are spread throughout, meaning danceability does not show a strong trend with instrumentalness or arousal.
It can be interpreted that AI-generated music might be more predictable and structured, focusing on instrumental, low-energy tracks, while Non-AI music covers a broader range, possibly due to human creativity and varied emotional expression.
The confusion chart on the left represents the performance of a classifier attempting to distinguish between AI-generated and non-AI-generated music. Using a nearest-neighbor classifier, the most important features selected for classification were: Instrumentalness, Danceability, Arousal, Valence, Tempo.
There is a clear peak around 90-100 BPM, suggesting that this is the most frequent tempo range in the class corpus. It is also interesting that there are multiple peaks, suggesting that there is a diverse set of tempos rather than a single dominant one. There are also some tracks in the lower tempo range (below 50 BPM), possibly due to half-time interpretations.
The `compmus_energy_novelty() function estimates novelty
based on sudden changes in loudness over time. It detects significant
shifts in energy levels, which are useful for identifying musical onsets
and transitions. For this particular track, there is an evident onset at
approximately 10 seconds, with smaller onsets at 115 and 125
seconds.
The compmus_spectral_novelty() function approximates
spectral novelty by analyzing cepstrograms, which represent changes in
the frequency content of a signal, detecting harmonic or timbral shifts
better than energy-based novelty. Because this visualization seems more
consistent that that of the energy novelty, it can be deduced that there
is less spectral variation and more energy novelty.
The regular tempogram shows several tempo harmonics (observable lines at increasing bpm values of 150, 300, 450). This suggests that the music contains strong rhythmic subdivisions, reinforcing multiple layers of the base tempo. The high presence of harmonics suggests a steady and rhythmic beat structure, likely because it is an electronic / techno track. The lack of tempo variation over time indicates that the tempo remains stable throughout the track. On the other hand, the cyclic tempogram shows a dominant tempo of around 150 bpm, with a weaker subharmonic at around 110 bpm.
Using the 1–0 coding for the chord templates, I generated a Keygram
for Reinout’s second track. This Keygram makes use of the new helper
function compmus_match_pitch_templates, which compares the
averaged chroma vectors against templates. Generally, a keygram shows
the progression of chords over time by matching chroma features (pitch
class profiles) to predefined chord templates. The visualization
represents which chords are most likely at each point in time. For
instance, for the chosen track, dark colors, such as at the start of the
track and around the 70-80 second range, represent short distances /
differences between the recorded chords and the template. keygrams help
identify harmonic progressions, modulations, and changes in harmony over
time.
This is the same keygram generateed using Temperley’s proposed improvements. It reveals more or less similar insights, but generally, Temperely’s improvements imply clearer or more stable key regions, assign higher weights to stable scale degrees (tonic, dominant), and reflect more natural tonal hierarchies
Chroma- vs Timbre- based self-similarity:
Timbre features, often represented by MFCCs (Mel-Frequency Cepstral Coefficients), capture the spectral characteristics of the sound. This timbre-based self-similarity chart highlights instrumental changes and overall sound quality / production shifts.The effectiveness of chroma- or timbre-based self-similarity for structural analysis depends on the specific characteristics of the track: While the chroma-based self-similarity captures harmonic progressions, tonal structure, key changes, and chord progressions (providing clearer structural pictures for tracks with harmonic content such as pop , jazz and classical music), timbre-based self-similarity captures instrumental texture and sound quality, outlining changes in orchestration, dynamics, and articulation. Because my chosen track is a more electronic / EDM song, its timbre features are at the forefront, making the timbre-based self-similarity chart more insightful. The timbre-based self-similarity chart portrays the tracks repeated instrumental sections through the prominent diagonal lines with sudden shifts indicating timbral changes (the introduction of new instruments)
Welcome to my Computational Musicology portfolio for 2025! This storyboard contains my perspective on the examples from each week.
This is the bad visualisation of the AI Song Contest we used in our first lab session, this time in a dashboard.
To improve and build upon the first visualization, I sought to formulate a story by improving the look of the visualization, making it more readable. I went about doing this by:
geom_rug() that did not add much
valueThis updated version:
✅ Clearly shows tempo vs. arousal trends
✅ Uses color and size effectively
✅ Highlights overall trends with a dashed trend line
✅ Has a clean and readable layout
I decided on exploring different genres that I like by asking different models for various genres of songs. I have always been a fan of post-punk, alternative rock music from the late 90s to early 2000s, such as Interpol, The Strokes, Bloc Party, Arctic Monkeys, Fontaines D.C. With the help of AI, I came up with this description to use as a prompt for gen AI music models: a high-energy alt rock / post-punk song with a melodic bassline, intricate drumming, and sharp and rhythmic guitar work, reminiscent of the bands Interpol and Bloc Party. dynamic, with tension-building verses leading into an explosive, anthemic chorus. create a sense of depth and intensity.” I also used this shortened version for models with character limits “Post-Punk, Driving Melodic Bassline, Angular Reverb-Drenched Guitars, Punchy Dynamic Drumming, Moody Detached Vocals, Urgent & Anthemic, Dark Yet Energetic, Tension-Building Composition, 140 BPM”. I have also recently been enjoying deep house music, so I decided to choose this genre as one of my songs. I used the following prompts, “deep house song that has a hypnotic beat, gradually layering warm synths, deep basslines, and subtle percussive beats, with a steady, entrancing rhythm. slow, cinematic build-ups that evoke nostalgia and euphoria. Incorporate atmospheric pads and a shimmering, time-dissolving feel of the track, with immersive, and emotionally uplifting verses and bridges, suitable for a sunset in the mountains” and “Deep House, Hypnotic Synth Pads, Pulsing Bassline, Rolling Four-on-the-Floor Groove, Atmospheric Textures, Slow-Building Progression, Dreamy Vocal Samples, Cinematic, Nostalgic, Expansive, 120 BPM”. I wanted to try a different genre that I also enjoy, something along the lines of Lana del Rey’s style. So I used the following prompts, “A cinematic baroque / dream pop composition that blends dreamy electronic synths with classical orchestration. Feature violins, melancholic clarinets, and rich trumpet swells, weaving through ethereal synth pads and delicate, reverberated piano. The rhythm should be slow and hypnotic, with a hazy, dreamlike quality. The vocals should be intimate yet grand, drenched in vintage-style reverb, with poetic, melancholic lyrics evoking themes of romance, nostalgia, and faded Hollywood glamour. Think of Lana Del Rey’s storytelling style, but with a modern dream pop twist—layered harmonies, sweeping crescendos, and an air of cinematic longing .” and “Baroque Pop / Dream Pop, Ethereal Synth Pads, Sweeping Violin & Clarinet Arrangements, Melancholic Trumpet Swells, Reverb-Drenched Intimate Vocals, Vintage Aesthetic, Poetic & Nostalgic, Cinematic & Grand, 80 BPM”. I explored the outputs of these prompts from various models including Suno, Stable Audio, Beatoven.ai, Soundverse.ai, Udio, and Mubert. Although I was hesitant to try Suno and Udio given their use of artists’ music without compensating them, I wanted to see whether there would be any differences in the quality, production output, relevance to the prompt, and similarity to expectations and existing songs.I found the vocal and lyrical qualities of most models to be of somewhat lower quality than I was expecting, with many songs sounding unnatural or AI generated (understandably).
My First Track
Description:
I ended up deciding on the Stable Audio deep house track for my first track because it seemed to best match my expectations of emotive, intense, while also calming and not sounding too elaborate.
My Second Track
Description:
For my second track, I ended up deciding to go with another deep house song that I generated on Suno.